skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Wang, Junmei"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Small molecules have been playing a crucial role in drug discovery; however, some exhibit nonspecific inhibitory effects during hit screening due to the formation of colloidal aggregators. Such false positives often lead to significant research costs and time investment. Therefore, to identify potential aggregating compounds efficiently and accurately at an early stage of drug discovery, we employed several machine learning techniques to develop classification models for identifying promiscuous aggregating inhibitors. Using a training dataset of 10 000 aggregators and 10 000 nonaggregators, models were trained by combining four different molecular representations with various machine learning algorithms. We found that the best-performing model is the one that employs path-based FP2 fingerprints in conjunction with the cubic support vector machine algorithm, which achieved the highest accuracy and area under the receiver operating characteristic curve values for both the validation and test datasets while maintaining high sensitivity and specificity levels (>0.93). Additionally, we have proposed a new model interpretation method, global sensitivity analysis (GSA), to complement the well-recognized SHapley Additive exPlanations analysis. Several comparative studies have shown that GSA is a time-efficient and accurate approach for identifying crucial descriptors that contribute to model prediction, especially in the scenario where the dataset contains a substantial number of data entries with a limited set of descriptors. Our models as well as GSA findings can provide useful guidance on screening library design to minimize false positives. 
    more » « less
    Free, publicly-accessible full text available May 1, 2026
  2. Abstract Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value of an enzyme-substrate pair indicates the rate an enzyme converts saturated substrates into product during the catalytic process. However, it is challenging to construct robust prediction models for this important property. Most of the existing models, including the one recently published by Nature Catalysis (Li et al.), are suffering from the overfitting issue. In this study, we proposed a novel protocol to construct Kcat prediction models, introducing an intermedia step to separately develop substrate and protein processors. The substrate processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using a graph neural network model, attentive FP, while the protein processor abstracts protein sequence information utilizing long short-term memory architecture. This protocol not only mitigates the impact of data imbalance in the original dataset but also provides greater flexibility in customizing the general-purpose Kcat prediction model to enhance the prediction accuracy for specific enzyme classes. Our general-purpose Kcat prediction model demonstrates significantly enhanced stability and slightly better accuracy (R2 value of 0.54 versus 0.50) in comparison with Li et al.’s model using the same dataset. Additionally, our modeling protocol enables personalization of fine-tuning the general-purpose Kcat model for specific enzyme categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as a case study, we achieved the best R2 value of 0.64 for the focused model. The high-quality performance and expandability of the model guarantee its broad applications in enzyme engineering and drug research & development. 
    more » « less
  3. Abstract Intranasal diamorphine (IND), approved for managing breakthrough pain in the UK, has been identified as an acceptable alternative offering effective, expedient, and less traumatic analgesia for children. However, the current dose regimen in pediatric populations relies on clinical expertise while the pharmacokinetics properties are poorly understood. This study aimed to develop diamorphine population pharmacokinetics (pop‐PK) models and simulate the IND dosing in virtual pediatric subjects. An integrated four‐compartment pop‐PK model with first‐order absorption and elimination provided an appropriate fit and characterized publicly available 385 concentration measurements of diamorphine, 6‐monoacetylmorphine, and morphine collected from adults. Body weight allometry and renal function maturation (age) were incorporated into the final model, serving as two covariates. The estimated IND relative bioavailability was around 52% compared with intramuscularly injected diamorphine. Using this final model, the morphine plasma concentrations, as the active metabolite for pain relief, were simulated in virtual subjects. The utility of model extrapolation was supported by external verification with acceptable average fold errors of 1.06 ± 0.30 and 0.83 ± 0.07 for morphine maximum concentration and exposures. Meanwhile, the simulated morphine concentration–time profiles could recover the PK profiles observed in children after a single dose of IND. The model‐based dosing simulations were therefore assessed in four children age groups to match the therapeutic window of morphine concentrations in steady state (10–20 μg/L). Our study demonstrates that the dose regimen of 0.3 mg/kg loading dose plus 0.1 mg/kg hourly maintenance dose is generally appropriate for multiple pediatric populations with breakthrough pain, in the view of PK. 
    more » « less
    Free, publicly-accessible full text available March 1, 2026
  4. Free, publicly-accessible full text available December 9, 2025
  5. ABSTRACT Accurate prediction of protein–peptide complex structures plays a critical role in structure‐based drug design, including antibody design. Most peptide‐docking benchmark studies were conducted using crystal structures of protein–peptide complexes; as such, the performance of the current peptide docking tools in the practical setting is unknown. Here, the practical setting implies there are no crystal or other experimental structures for the complex, nor for the receptor and peptide. In this work, we have developed a practical docking protocol that incorporated two famous machine learning models, AlphaFold 2 for structural prediction and ANI‐2x for ab initio potential prediction, to achieve a high success rate in modeling protein–peptide complex structures. The docking protocol consists of three major stages. In the first stage, the 3D structure of the receptor is predicted by AlphaFold 2 using the monomer mode, and that of the peptide is predicted by AlphaFold 2 using the multimer mode. We found that it is essential to include the receptor information to generate a high‐quality 3D structure of the peptide. In the second stage, rigid protein–peptide docking is performed using ZDOCK software. In the last stage, the top 10 docking poses are relaxed and refined by ANI‐2x in conjunction with our in‐house geometry optimization algorithm—conjugate gradient with backtracking line search (CG‐BS). CG‐BS was developed by us to more efficiently perform geometry optimization, which takes the potential and force directly from ANI‐2x machine learning models. The docking protocol achieved a very encouraging performance for a set of 62 very challenging protein–peptide systems which had an overall success rate of 34% if only the top 1 docking poses were considered. This success rate increased to 45% if the top 3 docking poses were considered. It is emphasized that this encouraging protein–peptide docking performance was achieved without using any crystal or experimental structures. 
    more » « less
  6. Structure-based virtual screening utilizes molecular docking to explore and analyze ligand–macromolecule interactions, crucial for identifying and developing potential drug candidates. Although there is availability of several widely used docking programs, the accurate prediction of binding affinity and binding mode still presents challenges. In this study, we introduced a novel protocol that combines our in-house geometry optimization algorithm, the conjugate gradient with backtracking line search (CG-BS), which is capable of restraining and constraining rotatable torsional angles and other geometric parameters with a highly accurate machine learning potential, ANI-2x, renowned for its precise molecular energy predictions reassembling the wB97X/6-31G(d) model. By integrating this protocol with binding pose prediction using the Glide, we conducted additional structural optimization and potential energy prediction on 11 small molecule–macromolecule and 12 peptide–macromolecule systems. We observed that ANI-2x/CG-BS greatly improved the docking power, not only optimizing binding poses more effectively, particularly when the RMSD of the predicted binding pose by Glide exceeded around 5 Å, but also achieving a 26% higher success rate in identifying those native-like binding poses at the top rank compared to Glide docking. As for the scoring and ranking powers, ANI-2x/CG-BS demonstrated an enhanced performance in predicting and ranking hundreds or thousands of ligands over Glide docking. For example, Pearson’s and Spearman’s correlation coefficients remarkedly increased from 0.24 and 0.14 with Glide docking to 0.85 and 0.69, respectively, with the addition of ANI-2x/CG-BS for optimizing and ranking small molecules binding to the bacterial ribosomal aminoacyl-tRNA receptor. These results suggest that ANI-2x/CG-BS holds considerable potential for being integrated into virtual screening pipelines due to its enhanced docking performance. 
    more » « less
  7. The axon guidance cue netrin-1 signals through its receptor DCC (deleted in colorectal cancer) to attract commissural axons to the midline. Variants in DCC are frequently associated with congenital mirror movements (CMMs). A CMM-associated variant in the cytoplasmic tail of DCC is located in a conserved motif predicted to bind to a regulator of actin dynamics called the WAVE (Wiskott-Aldrich syndrome protein–family verprolin homologous protein) regulatory complex (WRC). Here, we explored how this variant affects DCC function and may contribute to CMM. We found that a conserved WRC-interacting receptor sequence (WIRS) motif in the cytoplasmic tail of DCC mediated the interaction between DCC and the WRC. This interaction was required for netrin-1–mediated axon guidance in cultured rodent commissural neurons. Furthermore, the WIRS motif of Fra, theDrosophilaDCC ortholog, was required for attractive signaling in vivo at theDrosophilamidline. The CMM-associated R1343H variant of DCC, which altered the WIRS motif, prevented the DCC-WRC interaction and impaired axon guidance in cultured commissural neurons and inDrosophila. The findings reveal the WRC as a pivotal component of netrin-1–DCC signaling and uncover a molecular mechanism explaining how a human genetic variant in the cytoplasmic tail of DCC may lead to CMM. 
    more » « less